550 research outputs found

    Shiny Dashboard for Monitoring the COVID-19 Pandemic in Spain

    Get PDF
    [Abstract] Real-time monitoring of events such as the recent pandemic caused by COVID-19, as well as the visualization of the effects produced by its expansion, has highlighted the need to join forces in fields already widely used to working hand in hand, such as medicine, biology and information technology. Our dashboard is developed in R and is supported by the Shiny package to generate an attractive visualization tool: COVID-19 Spain automatically produces daily updates from official sources (Carlos III Research Institute and Ministry of Health, Consumer Affairs and Welfare) in cases, deaths, recovered, ICU admissions and accumulated daily incidence. In addition, it shows on a georeferenced map the evolution of active, new and accumulated cases by autonomous community allowing to travel in time from the origin to the last available day, which allows to visualize the expansion of infections and serves as a visual support for epidemiological studies.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431C 2018/4

    Prediction of Peptide Vascularization Inhibitory Activity in Tumor Tissue as a Possible Target for Cancer Treatment

    Get PDF
    [Abstract]The prediction of metabolic activities in silico form is crucial to be able to address all research possibilities without exceeding the experimental costs. In particular, for cancer research, the prediction of certain activities can be of great help in the discovery of different treatments. In this work it has been proposed to predict, through Machine Learning, the anti-angiogenic activity of peptides is currently being used in cancer treatment and is giving hopeful results. From a list of peptide sequences, three types of molecular descriptors were obtained (AAC, DC and TC) that offered the possibility of training different ML algorithms. After a Feature Selection process, different models were obtained with a predictive value that surpassed the current state of the art. These results shown that ML is useful for the classification and prediction of the activity of new peptides, making experimental screening cheaper and faster.Instituto Carlos III; PI17/01826Xunta de Galicia; Ref. ED431G/01Xunta de Galicia; , ED431D 2017/16Red Gallega de Investigación sobre Cáncer Colorrecta; Ref. ED431D 2017/23Ministerio de Economía y Competivividad; UNLC08-1E-002Ministerio de Economía y Competivividad; UNLC13-13-3503Ministerio de Economía y Competivividad; FJCI- 2015-2607

    Gene Signatures Research Involved in Cancer Using Machine Learning

    Get PDF
    [Abstract] With the cheapening of mass sequencing techniques and the rise of computer technologies, capable of analyzing a huge amount of data, it is necessary nowadays that both branches mutually benefit. Transcriptomics, in this case, is a branch of biology focused on the study of mRNA molecules, among others. The quantification of these molecules gives us information about the expression that a gene is having at a given moment. Having information on the expression of the approximately 20,000 genes harbored by human beings is a really useful source of information for the study of certain conditions and/or pathologies. In this work, patient expression -omic data data have been used to offer a new analysis methodology through Machine Learning. The results of this methodology were compared with a conventional methodology to observe how they differed and how they resembled each other. These techniques, therefore, offer a new mechanism for the search of genetic signatures involved, in this case, with cancer.Instituto de Salud Carlos III; PI17/01826Xunta de Galicia; ED431D 2017/16Red Gallega de Investigación sobre Cáncer Colorrectal; ED431D 2017/23Ministerio de Economía y Competitividad; UNLC08-1E-002Ministerio de Economía y Competitividad; UNLC13-13-3503Ministerio de Economía y Competitividad; FJCI- 2015-26071Xunta de Galicia; Ref ED431G/0

    Machine Learning Analysis of the Human Infant Gut Microbiome Identifies Influential Species in Type 1 Diabetes

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Diabetes is a disease that is closely linked to genetics and epigenetics, yet mechanisms for clarifying the onset and/or progression of the disease have sometimes not been fully managed. In recent years and due to the large number of recent studies, it is known that changes in the balance of the microbiota can cause a high battery of diseases, including diabetes. Machine Learning (ML) techniques are able to identify complex, non-linear patterns of expression and relationships within the data set to extract intrinsic knowledge without any biological assumptions about the data. At the same time, mass sequencing techniques allow us to obtain the metagenomic profile of an individual, whether it is a body part, organ or tissue, and thus identify the composition of a given microbe. The great increase in the development of both technologies in their respective fields of study leads to the logical union of both to try to identify the bases of a complex disease such as diabetes. To this end, a Random Forest model has been developed at different taxonomic levels, obtaining results above 0.80 in AUC for families and above 0.98 at species level, following a strict experimental design to ensure that results are compared under equal conditions. It is identified how, in infants, the species Bacteroides uniformis, Bacteroides dorei and Bacteroides thetaiotaomicron are reduced in the microbiota of those with T1D, while, the populations of Prevotella copri increase slightly and that of Bacteroides vulgatus is much higher. Finally, thanks to the more specific metagenomic signature at species level, a model has been generated to predict those seroconverted patients not previously diagnosed with diabetes but who have expressed at least two of the autoantibodies analysed.This work was supported by the “Collaborative Project in Genomic Data Integration (CICLOGEN)” PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013–2016 and the European Regional Development Funds (FEDER)—“A way to build Europe”. and the General Directorate of Culture, Education and University Management of Xunta de Galicia, Spain (Ref. ED431D 2017/16), the “Galician Network for Colorectal Cancer Research, Spain” (Ref. ED431D 2017/23) and Competitive Reference Groups, Spain (Ref. ED431C 2018/49). The funding body did not have a role in the experimental design; data collection, analysis and interpretation; and writing of this manuscript. CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidades from Xunta de Galicia, Spain”, supported in an 80% through ERDF Funds, Spain, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades, Spain” (Grant ED431G 2019/01). The funding body did not have a role in the experimental design; data collection, analysis and interpretation; and writing of this manuscript. The calculations were performed on resources provided by the Spanish Ministry of Economy and Competitiveness via funding of the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13-3503) and the European Regional Development Funds (FEDER) . Funding for open access charge: Universidade da Coruña/CISUGXunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431G 2019/0

    Identification of Prevotella, Anaerotruncus and Eubacterium Genera by Machine Learning Analysis of Metagenomic Profiles for Stratification of Patients Affected by Type I Diabetes

    Get PDF
    [Abstract] Previous works have reported different bacterial strains and genera as the cause of different clinical pathological conditions. In our approach, using the fecal metagenomic profiles of newborns, a machine learning-based model was generated capable of discerning between patients affected by type I diabetes and controls. Furthermore, a random forest algorithm achieved a 0.915 in AUROC. The automation of processes and support to clinical decision making under metagenomic variables of interest may result in lower experimental costs in the diagnosis of complex diseases of high prevalence worldwide.This work was supported by the “Collaborative Project in Genomic Data Integration (CICLOGEN)” PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013–2016 and the European Regional Development Funds (FEDER)—“A way to build Europe.” and the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16), the “Galician Network for Colorectal Cancer Research” (Ref. ED431D 2017/23) and Competitive Reference Groups (Ref. ED431C 2018/49). The funding body did not have a role in the experimental design; data collection, analysis and interpretation; and writing of this manuscriptXunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/4

    Técnicas basadas en kernel para el análisis de texturas en imagen biomédica

    Get PDF
    [Resumen] En problemas del mundo real es relevante el estudio de la importancia de todas las variables obtenidas de manera que sea posible la eliminación de ruido, es en este punto donde surgen las técnicas de selección de variables. El objetivo de estas técnicas es pues encontrar el subconjunto de variables que describan de la mejor manera posible la información útil contenida en los datos permitiendo mejorar el rendimiento. En espacios de alta dimensionalidad son especialmente interesantes las técnicas basadas en kernel, donde han demostrado una alta eficiencia debido a su capacidad para generalizar en dichos espacios. En este trabajo se realiza una nueva propuesta para el análisis de texturas en imagen biomédica mediante la integración, utilizando técnicas basadas en kernel, de diferentes tipos de datos de textura para la selección de las variables más representativas con el objetivo de mejorar los resultados obtenidos en clasificación y en interpretabilidad de las variables obtenidas. Para validar esta propuesta se ha formalizado un diseño experimental con cuatro fases diferenciadas: extracción y preprocesado de los datos, aprendizaje y selección del mejor modelo asegurando la reproducibilidad de los resultados a la vez que una comparación en condiciones de igualdad.[Resumo] En problemas do mundo real é relevante o estudo da importancia de todas as variables obtidas de maneira que sexa posible a eliminación de ruído, é neste punto onde xorden as técnicas de selección de variables. O obxectivo destas técnicas é pois encontrar o subconxunto de variables que describan do mellor xeito posible a información útil contida nos datos permitindo mellorar o rendemento. En espazos de alta dimensionalidade son especialmente interesantes as técnicas baseadas en kernel, onde demostraron unha alta eficiencia debido á súa capacidade para xeneralizar nos devanditos espazos. Neste traballo realízase unha nova proposta para a análise de texturas en imaxe biomédica mediante a integración, utilizando técnicas baseadas en kernel, de diferentes tipos de datos de textura para a selección das variables máis representativas co obxectivo de mellorar os resultados obtidos en clasificación e en interpretabilidade das variables obtidas. Para validar esta proposta formalizouse un deseño experimental con catro fases diferenciadas: extracción e preprocesar dos datos, aprendizaxe e selección do mellor modelo asegurando a reproducibilidade dos resultados á vez que unha comparación en condicións de igualdade. Utilizáronse imaxes de xeles de electroforese bidimensional.[Abstract] In real-world problems it is of relevance to study the importance of all the variables obtained, so that denoising could be possible, because it is at this point when the variable selection techniques arise. Therefore, these techniques are aimed at finding the subset of variables that describe' in the best possible way the useful information contained in the data, allowing improved performance. In high-dimensional spaces, the kernel-based techniques are of special relevance, as they have demonstrated a high efficiency due to their ability to generalize in these spaces. In this work, a new approach for texture analysis in biomedical imaging is performed by means of integration. For this procedure, kernel-based techniques were used with different types of texture data for the selection of the most representative variables in order to improve the results obtained in classification and interpretability of the obtained variables. To validate this proposal, an experimental design has been concluded, consisting of four different phases: 1) Data extraction; 2) Data pre-processing; 3) Learning and 4) Selection of the best model to ensure the reproducibility of results while making a comparison under conditions of equality. In this regard, two-dimensional electrophoresis gel images have been used

    Population Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programming

    Get PDF
    [Abstract] Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431D 2017/23Instituto de Salud Carlos III; PI17/0182

    Machine Learning Analysis of TCGA Cancer Data

    Get PDF
    [Abstract] In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.This work was supported by the “Collaborative Project in Genomic Data Integration (CICLOGEN)” PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013–2016 and the European Regional Development Funds (FEDER)—“A way to build Europe.” and the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431D 2017/16), the “Galician Network for Colorectal Cancer Research” (Ref. ED431D 2017/23) and Competitive Reference Groups (Ref. ED431C 2018/49). CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidades from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscriptXunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431G 2019/0

    Comparison of Outlier-Tolerant Models for Measuring Visual Complexity

    Get PDF
    [Abstract] Providing the visual complexity of an image in terms of impact or aesthetic preference can be of great applicability in areas such as psychology or marketing. To this end, certain areas such as Computer Vision have focused on identifying features and computational models that allow for satisfactory results. This paper studies the application of recent ML models using input images evaluated by humans and characterized by features related to visual complexity. According to the experiments carried out, it was confirmed that one of these methods, Correlation by Genetic Search (CGS), based on the search for minimum sets of features that maximize the correlation of the model with respect to the input data, predicted human ratings of image visual complexity better than any other model referenced to date in terms of correlation, RMSE or minimum number of features required by the model. In addition, the variability of these terms were studied eliminating images considered as outliers in previous studies, observing the robustness of the method when selecting the most important variables to make the prediction.The Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 2013-2016 and the European Regional Development Funds (FEDER) “A way to build Europe” support this work through the “Colaborative Project in Genomic Data Integration (CICLOGEN)” Pl17/01826. This work has also been supported by the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16), the “Galician Network for Colorectal Cancer Research” (Ref. ED431D 2017/23) and Competitive Reference Groups (Ref. ED431C 2018/49). On the other hand, the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13-3503) was funded by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Funds (FEDER)Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/4

    Wi-Fi Handshake: analysis of password patterns in Wi-Fi networks

    Get PDF
    This work is supported by the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16), the Galician Network for Colorectal Cancer Research (Ref. ED431D 2017/23), Competitive Reference Groups (Ref. ED431C 2018/49) and the Spanish Ministry of Economy and Competitiveness via funding of the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13-3503) and the European Regional Development Funds (FEDER). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.[Abstract]: This article seeks to provide a snapshot of the security of Wi-Fi access points in the metropolitan area of A Coruña. First, we discuss the options for obtaining a tool that allows the collection and storage of auditable information from Wi-Fi networks, from location to signal strength, security protocol or the list of connected clients. Subsequently, an analysis is carried out aimed at identifying password patterns in Wi-Fi networks with WEP, WPA and WPA2 security protocols. For this purpose, a password recovery tool called Hashcat was used to execute dictionary or brute force attacks, among others, with various word collections. The coverage of the access points in which passwords were decrypted is displayed on a heat map that represents various levels of signal quality depending on the signal strength. From the handshakes obtained, and by means of brute force, we will try to crack as many passwords as possible in order to create a targeted and contextualized dictionary both by geographical location and by the nature of the owner of the access point. Finally, we will propose a contextualized grammar that minimizes the size of the dictionary with respect to the most used ones and unifies the decryption capacity of the combination of all of them.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431D 2017/23Xunta de Galicia; ED431C 2018/4
    corecore